Goto

Collaborating Authors

 failure rate



AI chatbots miss urgent issues in queries about women's health

New Scientist

AI chatbots miss urgent issues in queries about women's health AI models such as ChatGPT and Gemini fail to give adequate advice for 60 per cent of queries relating to women's health in a test created by medical professionals Many women are using AI for health information, but the answers aren't always up to scratch Commonly used AI models fail to accurately diagnose or offer advice for many queries relating to women's health that require urgent attention. Thirteen large language models, produced by the likes of OpenAI, Google, Anthropic, Mistral AI and xAI, were given 345 medical queries across five specialities, including emergency medicine, gynaecology and neurology. The queries were written by 17 women's health researchers, pharmacists and clinicians from the US and Europe. The answers were reviewed by the same experts. Any questions that the models failed at were collated into a benchmarking test of AI models' medical expertise that included 96 queries.




Below are our responses to the comments

Neural Information Processing Systems

We would like to thank all the reviewers for recognizing the contributions of our work and providing valuable feedback. Below are our responses to the comments. ": we follow the kind suggestion from ": we would like to explain We will revise the paper accordingly and add more explanations to enhance the clarity and readability. Specifically, we use reference models trained on 1) ImageNet and 2) noisy CIFAR-10.1 images (with additive ": we explain that our method ": we appreciate the pointer to this contemporaneous work.



Optimizing Data Collection for Machine Learning

Neural Information Processing Systems

Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect. Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay workflows.



Supplementary Materials of Random Noise Defense against Query-Based Black-Box Attacks Zeyu Qin 1 Y anbo Fan

Neural Information Processing Systems

In this supplementary document, we provide additional materials to supplement our main submission. In Section A, we talk about the societal impacts of our work In Section B, we provide detailed experimental settings as well as further evaluation results on CIFAR-10 and ImageNet. In Section D, we give the proofs w.r .t . In Section E, we give the proofs w.r .t . The proofs of Theorem 3 are given in Section F. In Section C, we provide the analysis and evaluation of decision-based attacks.


Random Noise Defense Against Query-Based Black-Box Attacks Zeyu Qin 1 Y anbo Fan

Neural Information Processing Systems

DNN model as well as the training dataset, are often hidden from users. Instead, only the model feedback for each query ( e.g ., labels or confidence scores) is accessible. In this case, the product providers mainly face severe threats from query-based black-box attacks, which don't require any